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Abstract 

Let A be a matrix, c be any linear objective function and x be a fractional vector, say an LP solution to 
some discrete optimization problem. Then a recurring task in theoretical computer science (and in 
approximation algorithms in particular) is to obtain an integral vector y such that Ax » Ay and c T y 
exceeds c T x by only a moderate factor. 

We give a new randomized rounding procedure for this task, provided that A has bounded A- 
approximate entropy. This property means that for uniformly chosen random signs e {±1} on any 
subset of the columns, the outcome Ax can be approximately described using a sub-linear number 
of bits in expectation. 

To achieve this result, we modify well-known techniques from the field of discrepancy theory, es- 
pecially we rely on Beck's entropy method, which to the best of our knowledge has never been used 
before in the context of approximation algorithms. Our result can be made constructive using the 
Bansal framework based on semidefinite programming. 

We demonstrate the versatility of our procedure by rounding fractional solutions to column-based 
linear programs for some generalizations of Bin Packing. For example we obtain a polynomial time 
OPT + 0(log 2 OPT) approximation for Bin Packing With Rejection and the first AFPTAS for the 
Train Delivery problem. 



* Supported by the Alexander von Humboldt Foundation within the Feodor Lynen program. 



1 Introduction 



Many approximation algorithms are based on linear programming relaxations; for the sake of concrete- 
ness, say on formulations like 

min{c T x | Ax > b, x > 0} , 

with A e R nxm . Several techniques have been developed to round a fractional LP solution x to an integer 
one; the textbooks IVazOlllWSllI provide a good overview on the most common approaches. The aim of 
this paper is to introduce a new LP rounding technique that we term entropy rounding. 

To describe our method, we consider the random variable Ax, where % e {+ \} m is a uniformly chosen 
random coloring of the columns of A. Suppose that A has the property that one can approximately 
encode the outcome of A % up to an additive error of A with at most ¥ bits in expectation. In other words, 
we suppose that we can find some arbitrary function / such that || Ax - fix) Woo - A and the entropy of 
the random variables fix) can be bounded by ^. Note that the entropy could never exceed m, hence 
we only need to save a constant factor by allowing an approximation error. One possible choice could 
be fix) = 2A [^J , meaning that we round every entry of Ax to the nearest multiple of 2A. To bound the 
entropy of fix) one can then use standard concentration bounds since the values A\x = L Y=\ AijXU) are 
the sum of independently distributed random variables (here Aj denotes the ith row of A). If this holds 
also for any submatrix of A, we say that A has bounded A -approximate entropy. 

But why would it be useful to have this property for A? Since there are 2 m many colorings x> there 
must be an exponential number of colorings .. ., %^ e {±l} m > which are similar w.r.t. A, i.e. || - 
' lloo ^ A. Since there are so many similar colorings, we can pick two of them (say ') that differ 

in at least half of the entries and define X'-~\ W ~~ X^' ) as the difference of those colorings. Then x is a 
half- coloring, i.e. it has entries in {-1,0, 1}, but at least half of the entries are non-zero and furthermore 
Halloo < A. 

However, our aim was to find a vector y e {0, l} m such that Ay ~ Ax. We will iteratively obtain half- 
colorings and use them to update x, each time reducing its fractionality. Thus, we consider the least 
value bit in any entry of x; say this is bit K. Let / £ [m] be the set of indices where this bit is set to one 
and let A J £ A be the submatrix of the corresponding columns. Then by the argument above, there is a 
half-coloring j e {0,±1} J such that || A^jHoo < A. We use this information to round our fractional solution 
to x' := x+ i\) K X< meaning that we delete the .fifth bit of those entries j that have = -1; we round the 
entry up if xij) = 1 and we leave it unchanged if xi j) = 0. After iterating this at most log m times, the Kth 
bit of all entries of x will be 0. Hence after at most K • log m iterations, we will end up in a 0/ 1 vector that 
we term y. This vector satisfies || Ax- Aylloo ^ T.f =l i\) k -logm- A < logm- A. 

Let us illustrate this abstract situation with a concrete example. For the very classical Bin Packing 
problem, the input consists of a sorted list of item sizes 1 > Si > . . . > s n > and the goal is to assign all 
items to a minimum number of bins of size 1. Let S = {S Q [n] | E; e s s i - 1} De me set system containing 
all feasible patterns and let 1§ denote the characteristic vector of a set S. A well-studied column-based 
LP relaxation for Bin Packing is 

min{l T x| £ x s l s = l,x>ol (1) 

SeS 

(see e.g. |Eis57, GG61, KK82|). In an integral solution, the variable xg tells whether a bin should be 
packed exactly with the items in S. We want to argue why our method is applicable here. Thus let x be a 
fractional solution to [D. In order to keep the notation simple let us assume for now, that all items have 
size between Jr and 4. Our choice for matrix A is as follows: Let Ai be the sum of the first i rows of the 
constraint matrix of {!), i.e. Ats = \S n {1, /}|. By definition, for an integral vector y, Aiy denotes the 
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number of slots that y reserves for items in 1, . . . , /. If there are less than i many slots reserved, we term 
this a deficit. Since we assumed that the items are sorted according to their size, a vector y e {0, 1} S will 
correspond to a feasible solution if there is no deficit for any interval 1, . . ., i. 

To understand why this matrix A has the needed property, we can add some artificial rows until 
consecutive rows differ in exactly one entry; say n' < mk is the new number of rows. Then observe that 
the sequence A\\, A2X, ■ ■ ■ > A n >x describes a symmetric random walk with step size 1 on the real axis. We 
imagine all multiples of A as "mile stones" and choose /, [%) as the last such mile stone that was crossed 
by the first / steps of the random walk (i.e. by A\%, ■ Ai%). For an independent random walk it would 
take 0(A 2 ) iterations in expectation until a random walk covers a distance of A, thus we expect that the 
sequence f\ {%), . . . , f n < [%) changes its value only every ©(A 2 ) steps and consequently the entropy of this 
sequence cannot be large. But up to k steps of the random walk correspond to the same column of A and 
depend on each other. Using more involved arguments, we will still be able to show that for A := 0(4), 
the entropy of the sequence fi(.%), . . . , f n > {%) is bounded by 

More generally, we allow that the parameter A depends on the row i of A. Then the same arguments 
go through for A, := 0(j-), where Si is the size of item i. Thus our rounding procedure can be applied to 
a fractional Bin Packing solution x to provide an integral vector y with \Ai%- A; y| < O(logn) • A;. The 
deficits can be eliminated by buying 0(log 2 n) extra bins in total. 

The entropy-based argument which guarantees the existence of proper half- colorings % is widely 
termed "Beck's Entropy Method" from the field of discrepancy theory. This area studies the discrepancy 
of set systems, i.e. the maximum difference of "red" and "blue" elements in any set for the best 2-coloring. 
Formally, the discrepancy of a set system S c 2 [nI is defined as 

discfS 1 ) = min max|r(S)|. 

In fact, for a variety of problems, the entropy method is the only known technique to derive the best 
bounds (see e.g. ISpe85} ISSD ). 

1.1 Related work 

Most approximation algorithms that aim at rounding a fractional solution to an integral one, use one 
of the following common techniques: A classical application of the properties of basic solutions yields 
a 2-approximation for Unrelated Machine Scheduling ILS T871 . Iterative rounding was e.g. used 
in a 2-approximation for a wide class of network design problems, like Steiner Network |Jai98|, ran- 
domized rounding can be used for a 0(logn/loglogn) -approximation for Min Congestion |RT87] or 
Atsp |AGM + 10| . A combination of both techniques provides the currently best approximation guaran- 
tee for Steiner Tree |BGRS10|. The dependent rounding scheme was successfully applied to LPs of an 
assignment type | GKPS06 1 . Sophisticated probabilistic techniques like the Lovdsz Local Lemma were for 
example used to obtain 0(l)-approximation for the Santa Claus problem |Fei08, HSS10|. 

However, to the best of our knowledge, the entropy method has never been used for the purpose of 
approximation algorithms, while being very popular for finding low discrepancy colorings. For the sake 
of comparison: for a general set system S with n elements, a random coloring provides an easy bound 
of discGS) < 0( v /nlog(2|S'|)) (see e.g. |Mat99|). But using the Entropy method, this can be improved to 
disc(S0 < 0{\J n\og{2\S\l n)) for n < \S\ |Spe85|. This bound is tight, if no more properties on the set 
system are specified. Other applications of this method give a 0(v7log n) bound if no element is in more 
than t sets ISri97l and a OiVklogn) bound for the discrepancy of k permutations. For the first quantity, 
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alternative proof techniques give bounds of 2t - 1 | BF81 1 and 0{\J t -logn) |Ban98|. We recommend the 
book of Matousek |Mat99| (Chapter 4) for an introduction to discrepancy theory. 

The entropy method itself is purely existential due to the use of the pigeonhole principle. But in a 
very recent breakthrough, Bansal |BanlO| showed how to obtain colorings matching the Spencer |Spe85| 
and Srinivasan |Sri97| bounds, by considering a random walk guided by the solution of a semidefinite 
program. 

Our contributions 

In this work, we present a very general rounding theorem which for a given vector xe [0, 1] m , matrices A 
and B, weights /u; and an objective function c, computes a binary random vector y which (1) preserves all 
expectations; (2) guarantees worst case bounds on lA/x-A, y] and |B;X-£,-y| and (3) provides strongtail 
bounds. The bounds for A depend on the entropy of random functions that approximately describe the 
outcomes of random colorings of subsets of columns of A, while the bounds for rows of B are functions 
of the weights ju,. 

We use this rounding theorem to obtain better approximation guarantees for several well studied Bin 
Packing generalizations. In fact, so far all asymptotic FPTAS results for Bin Packing related problems 
in the literature are based on rounding a basic solution to a column-based LP using its sparse support. 
We give the first alternative method to round such LPs, which turns out to be always at least as good as 
the standard technique (e.g. for classical Bin Packing) and significantly stronger for several problems. 
We demonstrate this by providing the following results: 

• A randomized polynomial time OPT + 0(log 2 OPT) algorithm for Bin Packing With Rejection, 
where in contrast to classical Bin Packing, each item can either be packed into a bin or rejected 
at a given cost. Our result improves over the previously best bound of OPT+ ^opt) 1 -"^ IEL10I . 

• We give the first (randomized) AFPTAS for the Train Delivery problem, which is a combination 
of a one-dimensional vehicle routing problem and Bin Packing. In fact, our algorithm produces 
solutions of cost OPT + 0(OPT 3/5 ) (see IDMM10I for anAPTAS). 

It would not be difficult to extend this list with further variant^ but we also believe that the method will 
find applications that are not related to Bin Packing. 

Organization 

We recall some tools and notation in Section[2] In Section[3]we revisit results from discrepancy theory and 
modify them for our purposes. In Section H] we show our general rounding theorem. Then in Sections [5] 
and[6]we demonstrate how our rounding theorem can be used to obtain approximation algorithms. In 
the Appendix we provide details on how to turn the existential proofs into polynomial time algorithms 
using semidefinite programming and how to solve the presented LP relaxations in polynomial time. 

'Some examples: In Generalized Cost Variable Size Bin Packing a list of bin types j = l,...,k, each one with individual 
cost cj e [0, 1] and capacity bj e [0, 1] is given (see|EL08 for an APTAS). We can obtain a OPT + Oflog 2 n) approximation. In its 
well-studied special case of Variable Size Bin Packing the bin costs equal the bin capacities (i.e. cj = bj for all j) and we can 
refine the bound to OPT + 0(log 2 OPT) (see |Mur87| for an AFPTAS). For Bin Packing With Cardinality Constraints, no 
bin may receive more than K items EL09 1 . We can get an OP T + 0(log 2 ri) approximation. However, we postpone proofs of this 
claims to the full version. 
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2 Preliminaries 



The entropy of a random variable Z is denned as 

1 



H(Z) = £Pr[Z = x]-log 2 



lPr[Z = x] 



Here the sum runs over all values that Z can attain. Imagine that a data source generates a string of n 
symbols according to distribution Z. Then intuitively, an optimum compression needs asymptotically 
for n — ► oo an expected number of n • H[Z) many bits to encode the string. Two useful facts on entropy 
are: 

• Uniform distribution maximizes entropy: If Z attains k distinct values, then H(Z) is maximal if Z 
is the uniform distribution. In that case H[Z) = log 2 (A;). Conversely, if H[Z) < 5, then there must 
be at least one event x with Pr[Z = x] > {\) s ■ 

• Subadditivity:li Z,Z' are random variables and /is any function, then H[f{Z,Z')) < H{Z)+H[Z'). 

We define ffre{±i} m /0£) as me entropy of f[%), where % is uniformly chosen from {±\} m . See the book 
of |AS08| for an intensive introduction into properties of the entropy function. We will make use of the 
Azuma-Hoeffding Inequality (see e.g. Theorem 12.4 in IMU05I ). 

Lemma 1. Let Xi,...,X n be random variables with |X,-| < a, andE[Xi | Xi,...,Xi_i] = for all i = 1,..., n. 
Let X := E" =1 Xi. Then Pr[|X| > A|| a\\ 2 ] < 2e~ A2 ' 2 for any A > 0. This still holds, if the distribution ofX t is 
an arbitrary function ofX\,..., i . 

The sequence Xi, . . ., X n is called a Martingale and the a; 's are the corresponding step sizes. Another 
tool that we are going to use is a special case of the so-called Isoperimetric Inequality of ^Kleitman |Kle66|. 

Lemma 2. For any X £ {0, l} m of size \X\ > 2 08m and m > 2, there are x,yeX with \\ x - y|| i > m/2. 

A function % : [m] — {0, ±1} is called a partial coloring. If at most half of the entries are 0, then % 
is called a half- coloring. For a quantity zeZ, \z\ denotes the integer that is closest to z (say in case of 
a tie we round down). If z e IR m , then \z\ = {\z 1 \,...,\z m \). For a matrix A e U nxm and / £ {l,...,m}, 
A J denotes the submatrix containing only the columns indexed in /. A submatrix A' Q A will always 
correspond to a subset of columns of A, i.e. A' e U nxm with m' < m. If % : J — - K is only defined on a 
subset jQ{l,...,m} and we write Ax, then we implicitly fill the undefined entries in [m]\J with zeros. We 
say that an entry x,- e [0, 1 [ has a finite dyadic expansion with K bits, if there is a sequence b\, . . . , bx e {0, 1} 
withxi = l^ =l 2- k -b k . 



3 Discrepancy theory revisited 

Initially the entropy method was developed to find a coloring % '■ \ m \ — * {±1} minimizing \Y.usXi\ f° r 
all sets in a set system, or equivalently to color columns of the incidence matrix A e {0, \} nxm of the set 
system in order to minimize II AjIIoq. In contrast, in our setting the matrix A can have arbitrary entries, 
but the main technique still applies. 
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Theorem 3. Let A e U nx m be a matrix with parameters Ai , . . . , A„ > such that 



H 



A iX 
2A; 



- J ! = !,...,« 



Then there exists a half- coloring % : [m] -» {±1,0} vw'th < A; for all i = l,...,n. 
Proof. From the assumption, we obtain that there must beafceZ" such that 

, m/5 



Pr 

fe{±H 



2Av 
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- I i=\ n 



> I i 



In other words there is a subset Y £ 



:!} m ofatleast2 m -(i) m/5 = 



4 

2s OT colorings such that 



2A ( 



= &; for 



all j e V and £ = 1, . . . , n. The Isoperimetric Inequality (Lemma[2} then yields the existence of %' ,%" e Y 
with \{j | ^ ^ > m/2. We choose %j '■= jtx'j ~ x'ft< th en X G {0,±l} m is the desired half-coloring. 
Finally, let us inspect the discrepancy of %: \ Ai%\ < \ \Mx! ~ A iX"\ — ^i- ^ 

The core of this proof was to show that there is an exponential number of colorings %',%" that are 
similar, meaning that A%' « A%" . This was done by considering disjoint intervals of length 2A; (for every 
i) and using entropy to argue that many colorings must fall into the same intervals. But on the other 
hand, Ai%' and Ai%" might be very close to each other, while they fall into different intervals and x' >x" 
would not count as being similar. 

Hence we want to generalize the notion of similarity from Theorem[3] Let A e U nx m be a matrix and 
A = (Ai, . . . , A, 2 ) be a vector with Aj > 0. Then we define the IS.- approximate entropy of A as^| 



H\{A) := min 

fi fn-{±lV 



I H (/i(i),...,/„(i)):|Ai-/ ! (l)|<A ! -Vi = l,...,nl 



First of all note that [A] is always upper bounded by the entropy of the random variables 



Ml 

2A, 



, since 



one can choose fiix) := 2A; • j^- . On the other hand, the claim of Theorem [3] still holds true if the 
assumption is replaced by [A) < since then one has exponentially many colorings Y such that 
the values fi (%) coincide for every x £ Y and hence for every half- coloring x '■= \ ix' ~ obtained from 
colorings x',x" e Y one has \ A iX\ ^ \\( A iX' ~ fdx'Ti ~ ( A iX" ~ fdx"))\ ^ A *'- More formally: 

Corollary 4. Let A e U nxm , A := (A,, . . . , A„) > with H A {A) < f. Then there exists a half-coloring % ■ 
[m] — {±1,0} with -A < Ax < A. 

Moreover, also is subadditive, i.e. H(A,A')([g]) - H&(A) + H&{B), which follows directly from the 
subadditivity of the entropy function (here [g] is obtained by stacking matrices A and B). 

For now let us consider a concrete method of bounding the entropy of a random variable of the form 

T 

, where a is one of the row vectors of A. Recall that this immediately upperbounds H&{a). For this 
purpose, we again slightly adapt a lemma from discrepancy theory (see e.g. Chapter 4 in |Mat99|). 



The minimum is always attained since all probabilities are multiplies of ( ^ ) m and consequently the entropy can attain only 
a finite number of values. 
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Lemma 5. Let a e U m be a vector and A > 0. For A : 



A 
llalb' 



H 



T 

« X 
2A 



9e" A /5 ifA>2 
< G(A) := ^ 

[log 2 (32 + 64/A) ifA<2 



The proof can be found in Appendix[Cl But the intuition is as follows: Abbreviate Z := J • Then 



0(A 2 ) 



Pr[Z = 0] > 1 - e""' 1 ' and Pr[Z = i] < e~ n(i 1 ' for i^O.A simple calculation yields that H{Z) < e~ 
But for A « 2, with high probability one has at least |Z| < O(j) and consequently H{Z) < logO(j). 

The following function G -1 [b) will denote the discrepancy bound A that we need to impose, if we do 
not want to account an entropy contribution of more than b. 



G-\b):-- 



10ln(|) 



128 •( 



life 



0< b<6 
b>6 



Strictly spoken, G 1 is not the inverse of G, but it is not difficult to verify that G(G 1 {b)) < b for all b > 0. 
In other words, for any vector a and value b > 0, we can choose A := G -1 (&) • ||a||2, then H | 



T 

<* X 
2A 



4 The main theorem 

Now we have all ingredients for our main theorem, in which we iteratively round a fractional vector x 
using half- colorings %. Concerning the choice of parameters A, one has in principle two options: One 
can either give static bounds Aj to rows Aj such that H^{A') < #col ^ A ) holds for any submatrix A' £ A; or 
one can assign a fixed fraction to each row and then letting A; be a function of #co\[A'). In fact, we will 
combine these approaches, which will turn out to be useful later. 

Theorem 6. Assume the following is given: A matrix A e U nAXm , parameters A = (Ax,..., A WA ) > such 
ChatV/c {l,...,m} : H A {A J ) < a matrix B e [-l,l]" BXm , weights ^i,...,jLi„ 5 > with <l,a 
vector x e [0, l] m and an objective function c e [-1, l] m . Then there is a random variable ye {0, l} m with 

• Preserved expectation: E[c T y] = c T x, E[Ay] = Ax, E[By] = Bx. 

• Bounded difference: \c T x- c T y \ < 0(1); \AiX- A, -,y\ < log(min{4n,4m}) • A; for all i = 1, Ua (n '■= 
n A + n B ); \BiX-Biy\ < 0(^1/ ju ; ) for all i = l,...,n B . 

• Tai/ bounds: V/ : VA > 0: Pr[|^x- A,-y| > A • v /log(min{4n,4m}) • A ; ] < 2e _A2/2 . 

Proof. First, observe that we can append the objective function as an additional row to matrix B (with 
a weight of say p, c := | and halving the other m's), and so we ignore it from now on. Next, consider the 
linear system 

Az = Ax 
Bz = Bx 
0<zj < 1 Vj = l,...,m 

and let z be a basic solution. Apart from the 0/1 bounds, the system has only n constraints, hence the 
number of entries j with < Zj < 1 is bounded by n. One can remove columns of A with zj e {0, 1} and 
apply the Theorem to the residual instance. Hence we set x := z and assume from now on that m< n. 
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Furthermore we assume that x has a finite dyadic expansion, i.e. every entry Xj it can be written in 
binary encoding with K bits, for some K e N. This can be achieved by randomly rounding the entries 
of x to either the nearest larger or smaller multiple of (j) K for a polynomially large K, while the error is 
exponentially small in We perform the following rounding procedure: 

(1) WHILE x not integral DO 

(2) Let k e {1, . . . , K} be the index of the least value bit in any entry of x 

(3) J:={j e{l,...,m} \ xj's kth bit is 1} 

(4) Choose X e {0, + l} m withx(j) = Ofor ;£/, |supp(^)| > \J\/2, \A iX \ < Aj and\B iX \ < G" 1 |/| / 10)- 
sAT\ for all i. 

(5) With probability |, flip all signs in % 

(6) Update x:=x+[\) k x 

The interval of iterations in which bit k is rounded, is termed phase k. Let x (fc) be the value of x at the 
beginning of phase k and let x (fc,r) denote the value of x at the beginning of the fth to last iteration of 
phase k. From now on x always denotes the initial value, i.e. x = x m and our choice for the rounded 
vector is y := x (0) . 

Observe that flipping the signs in step (5) ensures that the expectations are preserved, i.e. E[Ay] = Ax 
and E[By] = Bx. There are two main issues: (/) showing that the choice of % in step (4) is always possible; 
{II) bounding the rounding error of y w.r.t. x. 



Claim (I). For any/e {l,...,m} there is a %e {0, + l} m withx(j) = for j £ /, \supp(x)\ > |/|/2, \Ai%\ < A ; - 

10 



and \B iX \ < G" 1 ^) • VUiforall i. 



Proof of claim. Our aim is to apply Theorem[3]to the stacked n x |/| matrix A = [ ^ ] with parameter A := 
(A, A') and A'. := G _1 (^) • vTTT- Note that || 2 < VU~\ since B has entries in [-1, 1], hence the entropy 
that we need to account to the 2th row of B is H | ^ j < By subadditivity of the (approximate) 
entropy function and the assumption that H A {A J ) < ^, 



n B 

H A (A)<// A (A 7 ) + X H 

i=i^{±i} /! 



r Bjx 

2A'. 

i . 



I/I S Mil/1 i/i 
< — + > - — < — . 
10 p x 10 5 



Thus the requirements of Theorem [3] are met, which then implies the existence of the desired half- 
coloring and Claim (/) follows. 

The next step is to bound the rounding error. 



Claim (II). One has \A t x- A t y\ < log(2m) • A; for all i = 1, . . . , n A and \BjX- B t y\ < 0(^1/ //j) for all i = 
l,...,n B . 

Proof of claim. Let ]{k, t) = {j | xj fc,r) 's fcthbitis 1} denote the set / in the fth to last iteration of phase 
k, i.e. J{k, t) => J{k, t-l) for any t. Since the cardinality of J{k, t) drops by a factor of at least 1/2 from 



3 Note that we have the term min{4wi,4n} instead of min{2m,2nl in the claim, to account for the rounding error to obtain a 
vector x with dyadic expansion and to account for the extra row c that we appended to B. 
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iteration to iteration, we have \J{k, t-l) \ < A -\J{k, t) \ for any t. Hence each phase has at most log 2 (m) + 1 
iterations. Then for any i = 1, . . . , 



\Aty-AiX\ 



K 



fc=lf>0 



■x (fc ' r+1) ) 



K (l\ 
< Y, log(2m) 

k=l 



\2 



A; <log(2m) • Aj- 



usingthatlA^x^' -x (fc ' f+1) )| < (Aj fc A; andLfc>i(A) fc = 1- Next, consider 



(2) 



IBfJc-B.-yl 



; ^ G _l^il/(fc, 01 

f>0 
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■yftjfic, t)\ 



(*) 
< 



DefG~ 
< 



(* *) 
< 



Z£Z 



E 128 

z>0 




r 

[2, 



Mi 



,z/2 




10 In 



-2 Z 
16 



z/2 



In (*) we use that since \J{k, t+l) \ > 2 • \J[k, t)\, for any k and z, there is at most one t such that 6-2 z < 
/iil/ 1 t V "' f)l < 6-2 z+1 ; (**) follows from the convergence of E z > (l/2) 2Z - z/2 andE z > v/z-(l/2) z . 

Inspecting (2} again, we see that AjX-Aty is a Martingale and the step size in iteration t < log(2m) of 
phase k is bounded by ctk,t '■= (j) A;. Observe that ||a||2 < A,- v / log 2 (2m), hence the tail bound Pr[|j4;X- 
^4/y| > A - Y/log(2m) - A,] < 2e" A2/2 for all X > follows from the Azuma-Hoeffding Inequality (Lemma[Tj. 
This concludes the proof of the theorem. □ 

Moreover, we can also compute such a vector y as guaranteed by the theorem in polynomial time, 
with the only exception that the guaranteed bound on \BjX- B,-y| is slightly weaker. But we still can 
provide that E[\Bix-Biy\] = Ois/TTjTi), which is already sufficient for our applications. We postpone the 
algorithmic details to Appendix|A] 

For one example application, let B e {0, l} nx n be the incidence matrix of a set system with n sets on a 
ground set of n elements. Then apply Theorem[6]with A = 0, x = (i, .. ., A) and /i,- = ^ to obtain a y e {0, 1}" 
with ||Bx-By||oo = 0{\fn). The coloring j e {0, 1}" with y-x+\x i s then an 0{\fn) discrepancy coloring, 
matching the bound of Spencer |Spe85|. Note that no proof using a different technique is known for 
Spencer's theorem. Hence it seems unlikely that Theorem[6](in particular the dependence on 1 / could 
be achieved by standard techniques (such as using properties of basic solutions or the usual independent 
randomized rounding). 



5 Application: Bin Packing with Rejection 

For classical Bin Packing, the input consists of a list of item sizes 1 > Si > ... > s n > and the goal is 
to assign the items to a minimum number of bins of size 1. For the performance of heuristics like First 
Fit, Next Fit and First Fit Decreasing, see |Joh73| |JDU + 74| |CGJ84| . A proof of strong NP-hardness can 



be found in IGJ79]. Fernandez de la Vega and Luecker |FdlVL81| developed an asymptotic polynomial 
time approximation scheme (APTAS). Later, Karmarkar and Karp |KK82| (see also IKV02IIWS11I ) found 
an algorithm that needs at most 0(log 2 OPT) bins more than the optimum solution. 
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In this section, we provide an application of our Entropy Rounding Theorem to the more general 
problem of Bin Packing With Rejection, where every item i can either be packed into a unit cost 
bin or it can be rejected at cost m > (which is also part of the input). The first constant factor ap- 
proximation and online algorithms were studied in |DH06|. Later an asymptotic PTAS was developed 
by |Eps06[|EpslO] (see IBCH08I for a faster APTAS). Recently Epstein & Levin IEL10I found an algorithm 
with running time polynomial in the input length and j which provides solutions of quality il + e)OPT+ 
2° ( " log "' (implying an APX < OPT + ^ os oPT) l -°m algorithm for an optimum choice of e). 

An asymptotic FPTAS (AFPTAS) is defined as an approximation algorithm APX producing solutions 
with APX < (1 + e)OPT + fills) in polynomial time (both in the input length and ^). But there is 
some ambiguity in the literature what concerns the term fills). According to |KV02| and |ZMO07|, / 
can be any function (equivalent to the requirement APX < OPT + oiOPT)), while Johnson | Joh85| re- 
quires / to be bounded by a polynomial (which is equivalent to APX < OPT + OiOPT 1 ' 3 ) for a fixed 
8 > 0). However, we will now obtain a polynomial time algorithm for Bin Packing With Rejection with 
APX < OPT + 0(log 2 OPT), which satisfies also the stronger definition of Johnson |Joh85| and matches 
the bound for the special case of Bin Packing (without rejection). 

We define a set system S = BuR with potential bin patterns B = {S£ [n] \ T.ies s i — 1} (each set S e B 
has cost cs := 1) and rejections R = {{i} \ i e [«]} at cost q,j := m for i e [n] . Then a natural column-based 
LP is 



where Is £ {0, 1}" denotes the characteristic vector of S. In IEK RS11I , the Karmarkar-Karp technique |KK82| 
was modified to obtain a Oi\/n-log 312 n) bound on the additive integrality gap of ©. Note that due to the 
dependence on n, such a bound does not satisfy the definition of an AFPTAS, and hence is incomparable 
to the result of | EL09 1 . But since OP Tf < n, our result improves over both bounds | EL09 , EKRS1 1 1 . 

Despite the exponential number of variables in LP ©, one can compute a basic solution x with 
c T x < OPTf + 5 in time polynomial in n and 1/5 |KK82| using either the Grotschel-Lovasz-Schrijver vari- 
ant of the Ellipsoid method | GLS81 1 or the Plotkin-Shmoys-Tardos framework for covering and packing 
problems IPST95 1 . Since this fact is rather standard, we postpone details to Appendix|Bl 

In the following we always assume that the items are sorted w.r.t. their sizes such that si > ... > s n 
and Tit < 1 for all i = 1, A feasible solution y e {0, 1} S will reserve at least one slot for every item, 
i.e. i many slots for items 1,..., i. The quantity i -LsesyslSn if positive, is called the deficit of 

{1, . . . , i}. It is not difficult to see that if there is no deficit for any of the sets {1, . . . , z'}, then every item can 
be assigned to a slot - potentially of a larger iterr@ (while in case that ys = 1 for Se R, the slot for S = {/} 
would only be used for that particular item). 

We term the constraint matrix P of the system HJ the pattern matrix. Note that some columns of P 
correspond to bins, others correspond to rejections. The obvious idea would be to apply our rounding 
theorem to P, but this would not yield any reasonable bound. Instead, we define another matrix A of the 
same format as P, where At := Ljv =1 Pi' or equivalently, the entries are defined as Ais = |Sn {1, . . . , i}\. The 
intuition behind this is that if x is a feasible fractional solution then Ay - Ax > iff y does not have any 
deficit. Indeed, we will apply Theorem[6]to this cumulated pattern matrix A. As a prerequisite, we need 
a strong upper bound on the approximate entropy of any submatrix. 



4 Proof sketch: Assign input items i iterativeiy in increasing order (starting with the iargest one, i.e. i = 1) to the smallest 
available slot. If there is none left for item i, then there are less then slots for items thus this interval had a deficit. 




(3) 
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Lemma 7. Let A e Z"g m be any matrix in which column j has non- decreasing entries from {0, . . . , bj}; let 
a = T.J = i bj be the sum over the largest entries in each column and ft := maxy =1 |OT bj be the maximum 
of those entries. For A > one has 

(o-B^ 

H^A)<0 -§ 

Proof. We can add rows (and delete identical rows) such that consecutive rows differ in exactly one entry. 
This can never lower the approximate entropy Now we have exactly n-o many rows. There is no harm 
in assuming that a, A and /3 are powers of 2. Let B e IR tJxm be the matrix with Bj = Aj - Ai-\ (and 
B\ = A\). In other words, B is a 0/1 matrix with exactly a single one per row. 

Consider the balanced binary laminar dissection D := {{2 fc (z - 1) + 1,.,.,2 /) | k = 0,...,logcr; i = 
I,..., ^} of the row indices {l,...,o~}. In other words, D contains 2 k many intervals of length cr/2 k for 
k = log 2 o. For every of those interval De Dwe define the vector Cd '■= Hud Bi e Z m and parame- 
ter Ad := q9 f llz| if |D| = 2 Z ■ ^ (with zeZ). Note that since B contains a single one per row and at most /3 



32-1. lM i"i ^ p 

ones per column, thus || Cd II2 ^ ( II Cd 



iz/2 



For every i, note that {1, . . . , i} can be written as a disjoint union of some intervals D\,.. .,Dq e D, all 



of different size. We choose fa {%) := Hl =1 2Ao p • 



Cpp X 

2Ar 



Then 



|A'I-//(I)|< E A Dp < ^ 
p=i 



;1 ^32-1.1^ 



< A. 



Thus iA) is upper bounded by the entropy of the random variables f\ (j) , . . . , f„ ( j) . But since each ft is 
a function of { \CdX/(2&d)\ \deD' ^ * n ^ act su ffi ces to bound the entropy of the latter random variables. 



We remember that for all z e Z, one has at most many intervals D e D of size |D| = with 



2 Z A Z 



Q)[| 2 <2 z/2 -Aand A D 
CdX 



H 



2A D 



32-1. 1' 2 ' ~ 32-l.l^l-A2 z/2 



|Cfl| 



o > = 

2 - 32-1.3 2 



I C D || 2 . Finally 



H subadd. 
& Lem.|5] 

< 



zeZ 

Def. G O-^S 



2 Z A 2 



1 



A 2 



132-1.3 

z 



(l\ z 



E (5) 9,- (1 ™ 2/5 + E (5) ■lo&(32 + 64.32.1J*) 

-oo<z<-15V z / z>-15vW 



= 0(1) 



= 0(1) 



O 



U 2 



□ 

The parametrization used in the above proof is inspired by the work of Spencer, Srinivasan and 
Tetali | SST | . A simple consequence of the previous lemma is the following. 



Lemma 8. Let Si, S m c [n] be a set system with numbers 1 > Si > . . . > s n > such thatJ^ ieSj s- < 1 for 
any set Sj . Let A e Z"q m be the cumulated pattern matrix, defined by A[j = \Sj n {1, . . . , Then there is 
a constantC>0 such that for A := j-, onehasH A (A) < jgT.'J l =1 'LieSj s i-fti- 



5 In fact, rounding a, /3, A to the nearest power of 2 only affect the constant hidden in the 0(1) -notation. 



3 Here we use Holder's inequality; || x\\ 2 ^ dl-^lloo' HjtIIi) for every xe 
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Proof. Let A e be the submatrix of A consisting of all rows i suchthatS; > (1/2)^. Note that row Aj possibly 
appears in several^. We apply Lemma[7]to A e with f> := \\A e \\ OQ < 2 e and a := Y.f =1 \SjC\{i : 5/ > {l/2) £ }\ 
to obtain H c . 2 e{A e ) < ^ x (l/2/ ■ \Sj n {i : s t > {l/2) e }\ for C large enough. Eventually 

subadd. ^ , 1 ™ * . \ ™ 

£>o zu ^>0;'=i iU j=lieSj 

since item i e Sj with {l/2) e < Sj < (l/2) m contributes at most ^!>>^(l/2/' < is, to the left hand 
side. □ 

Our procedure to round a fractional Bin Packing With Rejection solution x will work as follows: 
For a suitable value of e > 0, we term all items of size at least e large and small otherwise. We take the 
cumulated pattern matrix A restricted to the large items. Furthermore we define a matrix B such that 
Bx denotes the space reserved for small items. Then we apply Theorem[6]to obtain an integral vector y 
which is then repaired to a feasible solution without significantly increasing the cost of the solution. 

Theorem 9. There is a randomized algorithm for Bin Packing With Rejection with expected polyno- 
mial running time which produces a solution of cost OP Tf + 0(log 2 OP Tf) . 

Proof. Compute a fractional solution x e [0, l] s to the Bin Packing With Rejection LP l[3) with cost 
c T x < OPTf + 1 (see Appendix|B]for details). We define e := — gpy^ (assume OPTf > 2). For any item 
i that is rejected in x to a fractional extend of more than 1 - e (i.e. X{i) > 1 - e), we fully reject item i. We 
account a multiplicative loss of 1/(1 - e) < l + 2e (i.e. an additive cost increase of O(logOPTy)). From 
now on, we may assume that xg < 1 - e for all sets Se R. Let 1, . . . , L be the items of size at least e, hence 
OP Tf > e 2 L, since every item is covered with bin patterns at an extend of at least e. Let A e Z Lx s with 

A is :=\{l,...,i}nS\ 

be the cumulated pattern matrix restricted to the large items. According to Lemma |H for a choice of 
A; = 0(1/5/), one has H A {A') < tSSMH for every submatrix ^1' c A. Choose Be[0,l] lxS as the row vector 
where Bi s = EjeS:i>x s i for S e 5 denotes the space in pattern S that is reserved for small items. 

We apply Theorem [16] (Theorem [6] suffices for a non-constructive bound on the integrality gap) to 
matrices A, B and cost function c with = 1 to obtain a vector y e {0, 1} S with the following properties 

(A) The deficit of any interval {1, i} of large items (i.e. i e {!,..., L}) is bounded by Ot-MogL). 

(B) The space for small items reserved by y equals that of x up to an additive constant term (formally 

\T. S Esfrs-ys)-B l>s \ = oa)). 

(C) c T y<c T x + 0{l) 

For £ > 0, we say that the items G £ := {i < L \ {\) e > i > {\) e+l } form group £. Note that at most \ + 
1 groups contain large items. We eliminate the deficits for large items by packing O(logL) extra bins 
with the largest item from every group, hence leading to O(logL) • 0(log i) = 0(log 2 OPTf) extra bins. 
Property [B) implies that after buying 0(1) extra bins for small items, it is possible to assign all small 
items fractionally to the bought bins (i.e. the small items could be feasibly assigned if it would be allowed 
to split them). By a standard argument (see e.g. |EL09|) this fractional assignment can be turned into an 
integral assignment, if we discard at most one item per pattern, i.e. we pack discarded small items of 
total size at most £ ■ {c T y + 0(1)) separately, which can be done with 0{e) ■ OPTf + 0(1) < 0(logOP7y) 
extra bins. The claim follows. □ 
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6 Application: The Train Delivery Problem 



For the Train Delivery problem, n items are given as input, but now every item i e {1, . . . , n] has a size 
Si and a position p, e [0, 1]. The goal is to transport the items to a depot, located at 0, using trains of 
capacity 1 and minimizing the total tour length^. In other words, it is a combination of one -dimensional 
vehicle routing and Bin Packing. We define S as a set system consisting of all sets S Q [n] with Y.ies s i - 1 
and cost c$ := max ie s pi for set S. Then the optimum value for Train Delivery equals the cost of the 
cheapest subset of S covering all items (ignoring a constant factor of two) . 

The problem was studied in |DMM10|, where the authors provide an APTAS. We will now obtain an 
AFPTAS. First, we make our life easier by using a result of |DMM10| saying that modulo a 1 + 0(f) factor 
in the approximation guarantee it suffices to solve well-rounded instances which have the property that 
e < pi < 1 and pi e (1 + e) z for all i = 1, .. ., n. The first condition can be obtained by splitting all tours in 
a proper way; the second condition is obtained by simply rounding all positions up to the nearest power 
of 1 + £. Hence, we can partition the items according to their position by letting Pj := {i | pi = (1 + e) j } for 
; = 0,...,t-lwitht=O(±log±). 

Our rounding procedure works as follows: analogously to Bin packing With Rejection, we con- 
struct matrices A{ j),B[j) separately for the items at each position j. Then we stack them together; apply 
TheoremEl and repair the obtained integral vector to a feasible solution (again analogous to the previous 
section). Here we will spend a higher weight pj for positions j which are further away from the depot - 
since those are costlier to cover. 

Theorem 10. There is a randomized algorithm with expected polynomial running time for Train Deliv- 
ery, providing solutions of expected costE[APX] < OPTf + 0{OPTj 15 ). 

Proof. Compute a fractional solution x for the Train Delivery LP (i.e. again LP lO, but with the prob- 
lem specific set system and cost vector) of cost c T x < OPTf + 1 (see Appendix [B] for details). We will 
choose £ := l/OPTj for some constant < 8 < 1 that we determine later and assume the instance is 
well-rounded. 

By 1,...,L we denote the large items of size > £. Let A{j) e z (P ' n[i]) * s with entries A iiS ij) = |Sn 
{l,...,i}nPj\ be the cumulated pattern matrix, restricted to large items at position Pj. We equip again 
every row Aiij) with parameter A,(j) := 0(1 /s,). Then we stack A[Q),...,A[t- 1) together to obtain an 
L x |S| matrix A. Again, we need to show that for any submatrix A' Q A, one has H&{A') < #C0 ^ A 1 . Let 
S' £ S be the sets whose characteristic vectors form the columns of A'. We apply Lemma[8]individually 
to each A'{j) and obtain 

, t? ,1 t, 1 v-. #colL4') 

J/ A (A')<£// A(/) (A' (;))<-£ X X st<— — 

j=0 iu j=OS£S'iaSnPj iu 

using that every set S contains items of total size at most 1. Furthermore, we define a matrix B e [0, 1] tx s 
with Bj^ := T.ieSnPj:i>L Si as the space that pattern S reserves for small items at position j. We equip the 
y'th row of B with weight pj := | • (1 + e/4)~ J , i.e. the weight grows with the distance to the depot. Note 
thatly> 0j u ; - < £;>o | • (l + e/4)^' = f + § < 1. Moreover OPTf > T.ie[n] $iPi - e 2 -i (see IDMM10I ). hence 
the number of rows of A and B is L + t < poly(OPTy). 



7 For definiteness, say the tour must start and end at the depot and once items are loaded into the train, they have to remain 
until the depot is reached. 
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We apply Theorem [TBI (again Theorem [6] suffices for an integrality gap bound) to obtain an integral 
vector y e {0, \} s with the following error guarantees: 

• Large items: Let i e Pj, i < L. Then the deficit of {l,...,i}r\Pj is bounded by 0(j- log OPTf). 

For every position we use 0(log 2 OPTf) extra bins to eliminate the deficits of large items, which 
costs in total L/>oQ + e)~ } ■ 0(log 2 OPT f ) = 0{\ log 2 OPT f ). 

• Small items: For position j, the expected discrepancy in the reserved space for small items is 
E[\Bjx-Bjy\] < Oi^/Vjrj) = 0{y/\le- (1 + e/A)i). 

We buy \Bjx— Bjy\ extra bins to cover small items at position j. Their expected cost is bounded 
by OCVTTe) • (1 + f V 72 • (1 + < 0(VUe) ■ (1 - el2V . In total, this accounts with an expected cost 
increase of 0(\/l/e) L/>od ~ f/2)- 7 = 0(1) • (l/e) 3/2 for all positions. Now, for every position the 
space reserved for small items is at least as large as the required space, hence the small items can 
be assigned fractionally Then after discarding at most one small item per pattern, even an integral 
assignment is possible. We account this with a multiplicative factor of 1/(1 - e) < 1 + 2e. 

Summing up the bought extra bins, we obtain a solution APX with 

E[APX]<{l + 0(e))OPT f + -log 2 OPTf + 0((l/e) 3/2 ) < OPT f + 0{OPT 3 f i5 ). 



choosing £ := OP Tj 2/5 . □ 
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Appendix 



A Computing low-discrepancy colorings by SDP 

Observe that the only non-constructive ingredient we used is the application of the pigeonhole principle 
(with an exponential number of pigeons and pigeonholes) in Theorem|3]and Corollary[4]to obtain the ex- 
istence of low-discrepancy half- colorings. The purpose of this section is to make this claim constructive. 
More precisely we will replace each phase of log m iteratively found half- colorings by finding a single full 
coloring. 

Theorem 11. Let A e <Q>'^ xm , B e [-l,l] n » xm , Ai,...,A nA > 0, Hi,...,n nB > (withZiHi <l,m>2)be 
given as input. Assume that MA' Q A : H&{A') < #C0 ^ A } . Then there is a constant C > and a randomized 
algorithm with expected polynomial running time, which computes a coloring x ■ [m] -* {±1} such that 
for all A >0, 



Pr 



\AiX\ >A-C x /logm-A 



< 4e" A2/2 Vi = l,...,n A 



Pr \BiX\>X-Cy/Ujrt < 160-2" A/6 Vi = l n B 



Note that A; > || A,- W^, thus we may rescale A and A so that A; > 1 and \\AWtx, < 1. We may assume that 
n > m. Furthermore we assume A > 1 and m is large enough, since otherwise all probabilities exceed 
1 for C suitable large and there is nothing to show. The following approach to prove Theorem [TT] is a 
adaptation of the seminal work of Bansal IBanlOl . 

A.1 Some preliminaries 

The Gaussian distribution N{[i, a 2 ) with mean /i and variance a 2 is defined by the density function 



\J2na 



-{x-n) 2 l(2a 2 ) 



If g is drawn from this distribution, we write g ~ N{jj,,a 2 ). The n-dimensional Gaussian distribution 
N n {Q, 1) is obtained by sampling every coordinate gi independently from JV(0, 1). Since N n {0, 1) is rota- 
tionally symmetric, one has 



2>- 



Fact 12. Let veU" be any vector and g ~ N n {0, 1), then g T v~ N(0, || v\ 
Martingales & concentration bounds 

A Martingale is a sequence = Xo, X\ , . . . , X n of random variables with the property that the increment 
Yi := Xi - Xi-i has mean E[Y{] = 0. Here Y{ :- F i (Xo,...,X ; -i) is allowed to arbitrarily depend on the 
previous events Xo, . . . , Xi-\. We will make use of the following concentration bound: 

Lemma 13 (|BanlO|). LetO = Xo, . . . , X n be a Martingale with increments Yi, where Yi = r\iGi, Gi ~ A^(0, 1) 
and \rji\< 8. Then for any A > 

Pr[|X„| >k8\fn\ <2e~ x2 ' 2 . 



17 



Semidefinite programming 

A matrix Y e R nxn is termed positive- semidefinite (abbreviated by Y > 0), if x T Yx > for all i£lR" (or 
equivalently all eigenvalues of Y are non-negative). Let S n = {Y e U nxn | Y T = Y; Y > 0} be the convex 
cone of symmetric positive-semidefinite matrices. A semidefinite program is of the form 

max C»Y 

D £ »Y < d e V£=l,...,k 
Y e S„ 

Here C • Y = L" =1 L" =1 Qj ' Yij * s tne "vector product for matrices" (also called Frobenius product). In 
contrast to linear programs, it is possible that the only feasible solution to an SDP is irrational even if 
the input is integral. Furthermore the norm even of the smallest feasible solution might be doubly- 
exponential in the input length |Ram95|. Nevertheless, given an error parameter e > and a ball of 
radius R that contains at least one optimum solution (of value SDP), one can compute a Y e S„ with 
C • Y > SDP - e and D e • Y < dg + e for all £ = 1, . . ., k in time polynomial in the input length and in 
log(max{l/£, R}). Since for our algorithm, numerical errors could be easily absorbed into the discrepancy 
bounds, we always assume we have exact solutions. The first use of SDPs in approximation algorithms 
was the MaxCut algorithm of Goemans and Williamson |GW04|. Later on SDPs were used for example 
to approximate graph colorings |KMS98|. We refer to the surveys of |Lov03, Goe97| for more details on 
semidefinite programming. 

Using that any symmetric, positive semidefinite matrix Y can be written as W T W for W e U nxn (and 
vice versa), the above SDP is equivalent to a vector program 

n n 

maX Z E C ij V i V j 
i=l;'=l 

Y,H D ij v i v j * d t v/ = i,...,jb 

i=l;=l 

Vi e U n Vi = l,...,n 

A.2 The algorithm 

Consider the following semidefinite prograrr{§ 

Aj Vi=l,.,.,n A 

G-^l/f-ilj-VL^iT V/ = l,...,n B 

l/f-il 
2 

1 Vj£Jt-i 
VjtJt-i 
U m V; = l,...,m 

More precisely this program is equivalent to a semidefinite program. 



;'=i 

m 

7=1 

m 

7 = 1 

Ik; II 2 < 
V i = 
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Here, ] t -\ Q [m] denotes the set of activevariables at the beginning of step t. Initially we have a fractional 
coloring x° = (0,...,0) and all variables are active, i.e. Jo = [m]. For a certain number of iterations, we 
sample an increment jt using a solution from the SDP and add it to %. If a variable x(j) reaches +1 
or -1, then we freeze it (the variable becomes inactive and is removed from J t ). Note that the proof of 
Theorem [6] guarantees that no matter which variables are currently contained in J t , the SDP is always 
feasible. 

Let s := ; 1 be a step size and ^ = 20 • log m be the number of iterations. For t = !,...,£ 

repeat the following: 

(1) Compute a solution {Vj}j e [ m ] to the SDP 

(2) Sample g ~ N m (0,1) 

(3) Update j t {j) := s-g T Vj, %* ■= X*~ l +Tt 

(4) If % t {j) £ [1 - \, 1] ([-1,-1 + \\, resp.) then % t {j) := 1 (-1, resp.), j becomes inactive 



Note that s- g T vj ~ N(0, s 2 1| vj |||), hence Pr[y f (j) >-^]<2- e -WBMnm)) 12 _ _2_ using Lem ma[Il hence 
we may assume that \x*W I never exceeds 1. No constraint i will ever suffer an extra discrepancy of more 
than n ■ \ « A,- in Step (4), hence we ignore it from now on. 

It was proven in |BanlO|, that with high probability, after the last iteration all variables are inactive, 
i.e. % e e{±l} m . 

Lemma 14 ( IBanlOl ). The probability that within 16/ s 2 iterations, the number of active sets decreases by 
a factor of at least 2 is at least 1/2. 

Intuitively the reason is the following: consider a variable % l (j) and suppose for simplicity that always 
|| Vj\\2 = 1. Then, the values % Q (]),% l {j)>--- behave essentially like an unbiased random walk in which in 
every step we go either 5 units to the left or to the right. In a block of \ steps, with a constant probability 
we deviate - s steps from 0, i.e. \% l (])\> s-- s = \ and j got frozen at some point. Hence the chance that 
any of the m variables j is not frozen after 20 log m blocks (each of if iterations) can be easily bounded 
by e -ioiogm-o.9 2 /3 < J_ using the chernovbound (e.g. Thm 4.4 in IMU05I ). 

It remains to bound the discrepancy of % • Let {fj}j e [ m ] be the SDP solution and g t be the random 
Gaussian vector in step t. Then 

t t m I m . 

AtX* = E AiTt = E E sAtJgJvj = E Sl4 E Aijv*) 
t=l t=lj=l f=l j = l 

But sLjlx Aij Vj is a vector of length at most s-Aj, thus Afx e is a martingale and we can apply LemmafTBI 
with 5 := 5 • A, to bound 

' -A 2 /2 



Pr 



<2e~ 



However, we need to be a bit more careful to analyze the behavior of \BiX e \- In the following, we fix any 
index i £ {1, . . . , hb}. The difficulty is that the discrepancy that we allow for row i changes dynamically as 
the number of active variables decreases. The sequence ofiterations T r := {t | 2 r < \J t ~i\ < 2 r+1 }, in which 
the number of active variables is between 2 r and 2 r+1 is termed phase r. Let <5(r) := G -1 [j -2 r ) .2 (,+1)/2 
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be an upper bound on the discrepancy bound that is imposed to row i during this phase. Again we can 
write 



-.X{f) 



£ C m I m 

E Brr t \ = E sls( E B tJ v*)\ = £ * E si I B tjV ) 

t=l f=l ;•=! r>0 f£T r \j=l 



and || u l H2 < <5(r). By X(r) we denote the discrepancy that we suffer in phase r. We somewhat expect that 
E[X[r)] = 0[8{r)). In fact, this is true even with a strong tail bound. 



Lemma 15. For allr>0 and A > 0, Pr [X{r) > A • 8{r)] < 3 • 2 



-A/6 



Proof. If we suffer a large discrepancy in a single phase, then either the phase lasted much longer than 
0(4) iterations or the phase was short but the discrepancy exceeded the standard deviation by a large 
factor. However both is unlikely. More formally, for the event X{r) > A • 8{r) to happen, at least one of the 
following events must occur 

• Ei. |T r |>£ -±g 

• E2: Within the first j ■ if iterations of phase r, a discrepancy of A • 8{r) is reached. 
By Lemma[T4l one has Pr[£"i] < (i)^ A/4 -'. Furthermore 



Pr[£ 2 ] < Pr 



E 



T t 



first ^ itreTV 



4A 



> — ■ s8{r) ■ \\ 



--A-S(r) 



<2e 



-A/8 



by again applying Lemma [T3l with parameters A' := ^p; 8' := s- 8{r); n' := The claim follows since 



Pr[X(r) > A • 5(r)] < Pr[£i] + Pr[£ 2 ] S 2" LA/4J + 2e" A/8 < 4 • 2" A/6 . 



□ 



By the union bound, we could easily bound the probability that any phase r has X{r) > A • 8[r) by 



4 . 2~ Alb ■ log m and thus Pr[|B^ | > A • C/ sJJhI < 4 • 2" ' • log m. However, we can avoid the log m term 
by observing that the bound on \BiX- Biy\ in the proof of Theorem[6]receives the largest contributions 
within the small window, when the number of active variables is 0(l/jU;). Outside of this window, we 
have a lot of slack, that we can use here. 

We call a phase r bad, if \X(r)\ > d{r) ■ [X+ |r-log(^)|) (and good otherwise). Note that the term 
I r - log(^) I is indeed minimized if 2 r = ©(-7-). Then we can upper bound the probability that any phase 
is bad by 



E Pr 

r>0 



X{r)>8{r)- 



A + 



r - lot 



60 



' 1 Le ^ 2 ^ 4 . 2 -(A+z)/6 

z>0 



<80-2 



-A/6 



It remains to prove that if all phases r are good, then \Bi%\ < A • 0{\JTjJTi). Hence we consider 



\Bi%\ 



(*) 



< 



r>0 



A + 



r-log 



60 1 



120 



' Y, G~ 1 [6-2 z )-2 zl2 {X+\z\) 

Mi zeZ 



(***") , 
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for some constant C > 0. In (*) we assumed that all phases were good and in (**) we substitute z := 
r - log 2 (^) (or equivalently 2 Z = 2 r ■ Eventually we recall that already in the proof of Theorem[6] 
we saw that the series £ ze z G _1 (6 • 2 Z ) • 2 z/2 converges geometrically which is not affected by adding a 
polynomial term like \z\, hence giving (* * *) (here we also use X > 1). This almost concludes the proof of 
Theorem[TTJ since with probability at most -\ + -ikfs • £ < \ (for m large enough) the algorithm produces 
a failure, i.e. not all variables are frozen after 0(4 log m) iterations. In this case, we simply repeat the 
algorithm until it was successful. Then the actual tail bound that we obtain is a conditional probability 

Vx[\B iX \>XCI s fjH\ _ A/6 



Vr:[\Bn\> XCIJJTi | run successful] < ^— < 2-80-2" A/b (4) 

Pr[run successful] 

(same for | Ai%\) and the expected running time is polynomial. 
A.3 The Constructive Rounding Theorem 

Now that we can compute efficiently full colorings % such that A%,Bx « 0, it is not difficult anymore to 
give an algorithmic version of our main theorem. 

Theorem 16. Let Ae <Q>^ xm , B e [-1, l]"£ xra , A = (Ai, .. ., A„ A ) > 0, jU 1; . ..,n„ B > (L^fr < I, m > 2) and 
ce [-1, l] m be given as input, such that\/jQ {l,...,m} : H&{A J ) < ^. Then there is a constant C' > and 
a randomized algorithm with expected polynomial running time which obtains a y e {0, l} m such that 

• Preserved expectation: E[c T y] = c T x, E[Ay] = Ax, E[By] = Bx. 

• Bounded difference: \c T x-c T y\ < C'; \AiX—Ai y| < C'^logn- Y^logminin, m\- A; for all i = \,...,ua 
(n:= n A + n B ); \B t x-Biy\ < C'log(^) y/TTJTi for all i = l,...,n B . 

• Tail bounds: For all A > and all i: 

- Pr[\AiX- A t y\ > A - CVlogminfn, m} ■ A,-] <2-2" a2 

- Pr[\B i x-B i y\>X-C'/ s /m]<2-2- A . 

Proof. Again we can append c as an additional row to B, hence we ignore the objective function from 
now on. As described in the proof of Theorem[6l we can assume that m< n and that entries of x have a 
finite dyadic expansion with K bits. We perform the following algorithm: 

(1) FORfc:= K,K-l,...,lDO 

(2) / := {j e{l,...,m}\ x/s kth bit is 1} 

(3) Repeat computing : /-» {+1} according to TheoremfTTl until j tfc) is good, i.e. until 

• \Aii (k) \ < C ■ y'logra- v/Togm- A; for all i = l,...,n A 

• \B iX m I < C logC^J)/ Vm7 for all i = I, . . . , n B 

(4) Update x := x + {\) k x (k) 

Let y be the integral vector obtained at the end. For C' large enough, by TheoremQTJeach run to compute 
coloring X ik) has Pr[|A^I > C'y/\ogn ■ ^/Iogm • A/] < ^ and Pr[|B,-^| > C'log(^)/ y/pj] < By the 
union bound, each run of (3) is good with probability at least \ . By Equation (4} concerning conditional 
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probabilities, this only worsens the tail bounds provided by TheoremfTTIfor and Aitf® by a factor 

of at most 2. In any case we have a guarantee that 



\A iX -A iy \<j:[l) \Aa m \<j:[l) 



- • C' \ /logn-logm- A; = C'\/logn-logm- A,-. 



and analogously \BiX-Biy\ < C'log(^-)/ v / jEI7. The algorithm behind TheoremQTJis fully symmetric, i.e. 
E[Ax^] = for all k, hence £[^4y] = Ax (and similar E[By] = Bx). It remains to prove the tail bounds. 
We may assume that X > 1, since otherwise, the desired probabilities 2 • 2 and 2 • 2 exceed 1 anyway. 
Note that if \Ba (k) I < holds for all A;, then 



'IV 



\B iX -B iy \= X ^ B iX [k) <I 



C"-(A + /c) „ C" 



since Efc>i(s) fc - 2, Lfc>i(^) fc fc ^ 2 and X > 1. Hence we can use the bound 



Pr 



IBfJc-B.-yl > A- 



v^ 

for C' large enough. Similarly 



<£>r 

fc>i 



m a+m' c 



\B iX w \> 



4C v^IJ 



Thm.lTTl 



< Y, 320 2 ~ <2-2" 



Pr 



> A-C\/logm- A 



< £Pr 

fc>i 



|Aa (fc) l>^f^C,/]og;»A / 



Thm.lTTl , 



a+k) 2 c' 2 



< £ He <•-< < 2-2 
fc>i 



again for C' large enough and X > 1. 



□ 



B How to solve the LP relaxations 

All linear programs for which we provided rounding procedures were of the form min{c r x | LseS xs^-S = 
1, x > 0}, i.e. they all have an exponential number of variables. So, we should explain how such programs 
can be solved. In fact, the first polynomial time algorithm was proposed by |KK82| in the case of Bin 
Packing. Their approach solves the dual max)£" =1 y; | HiEsYi - c s VS e S} up to an arbitrarily small 
additive error using the Grotschel-Lovasz-Schrijver variant of the Ellipsoid method IGLS81 1. The error 
term cannot be avoided, since a Partition instance could be decided by inspecting whether OPTf < 
2 or not. The only additional prerequisite for the Karmarkar-Karp algorithm is an FPTAS for the dual 
separation problem (i.e. given dual prices yi,...,y n > 0, find a (1 - e) -approximation to max{^ T.iesYi I 
S e S}). Note that the same result is implied by the framework of Plotkin, Shmoys and Tardos | PST95| 
without using general LP solvers. 

It follows implicitly from both papers |KK82, PST95I that for any set family S Q 2'"' that admits an 
FPTAS for the dual separation problem, the corresponding column-based LP can be solved within an 
arbitrarily small additive error. 

However, we are not aware of an explicit proof of this fact in the literature. Hence, to be self-contained 
we provide all the details here. Our focus lies on giving a short and painless analysis, rather than giving 
the best bounds on the running time. Our starting point is the following theorem from |PST95| (para- 
phrased to make it self-contained). 
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Theorem 17 (Plotkin, Shmoys, Tardos |PST95l). Let A e U^ m be a matrix and P c be a convex set. 

Given < e < 1, n £ N, b £ Q" , p > max != i „max xe p 4^ as input. Then there exists an algorithm 

Cover which either computes an x £ P with Ax > (1 - e)b or asserts that there is no x e P with Ax > b. 
This algorithm calls K := 0(n + plog 2 (n) + pTog(^)) many times the following oracle (with e' £ {e/2, 1/6JJ 

Subroutine: Given 0<e' < ljeRj asinput. Find a x £ P such that y T Ax > {l-£')max{y T Ax \ 
xeP}. 

Assuming that for any x, the vector Ax, can be evaluated in time 0(ri), the additional running time of 
Cover is 0(K- ri). 

On an intuitive level, the algorithm of |PST95| maintains at any iteration some vector x £ P. Then 
for every element i £ [n] one defines certain dual prizes yi which are decreasing in -fc-. In other words, 
uncovered elements will receive a high dual price y, ; covered ones receive a low price. Then one com- 
putes a vector x which (approximately) maximizes the dual prices, meaning that x has a large incentive 
to cover elements i with AjX « b[. Then one replaces x by a convex combination of x and x and iterates. 

In the following we show how TheoremfTTl can be used to solve ©. 

Theorem 18. LetS £ 2 [ " ! be a family of sets with cost function c : S —-]0, 1] (assume c[S) can be evaluated 
in time 0{n)) such that for any y e Q" given as input, one can find an S* £ S with Efes* y< - (1 _ £ ) " 
max{^ir Eiesyi I S £ 5} in time T(n,e). Then for any given nil > 5 > 0, one can find a basic solution x of 
the LP 

OPr / - = minlc r x| £ x s l s > 1, x>ol 
of cost c r x <OPT f + 5 in time O \€ ln(| )) • (| T{n,Q.(5/n)) + n 2 ). 

Proof. Since no set costs more than 1, one has OPTf < n. By trying out 0(nl5) values, we may assume 
to know a value r with OPTf < r < OPTf + §. We define P = {x £ R^ Q \ £ SeS c s x s = r} and a matrb@ 
A£{0,l} nxS by 

' 1 i £ S 



as well as fo= (!,...,!). We choose 



[ otherwise 
p:=n>r> max max— — 

!=!,..., OT X£P O; 



and e := 4-. The next step is to design the Subroutine. Hence, let a vector of dual prices y £ Q" and a 
parameters' > be given as input. Then we compute a set S* £ S withX; £ s* yi > (1-e') -max{— Lies I 
S£ S} in time T{n,e). Observe that the vertices of P are of the form ^-es and 



y T A 



-e s 



f n f 

— E yi A is = — E y* 



hence, the vector x:= -r ■ es* is the desired (1 - e') -approximation for max{y T Ax | x £ P}. 

Applying Theorem[l7]yields a vector x £ P with Ax > (1 - e)l. Hence the slightly scaled vector x' = 
is feasible and has cost jir E < (OPTf + §) • (1 + 2e) < OPTf + 8. To turn x' into a basic solution, 



9 A and P are defined, but not explicitly computed. 
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we consider x" with x$ = x' s > for precisely n + 1 many sets (and x^' = otherwise). Then by Gauss 
elimination we find a x'" > with Ax'" = Ax", |supp(x"') | < n and c T x'" < c T x" in time 0(n 3 ) and replace 
the corresponding values in x' by those in x'". After iteration this at most K times, we obtain the desired 
basic solution. The total running time is 



-■K-T{n,D.ie)) + 0{K-n 3 )<0 ( " ' " ^ 1 
o 



7oln( T ) \'(-T{n,n{6/n)) + n' 
[o^ o J o 



3 

since Subroutine was called at most K = 0(J? ln(^)) times. □ 

Here the running times are w.r.t. the RAM model, were any arithmetic operation accounts with unit 
cost. 



Applications for considered problems 

In order to solve the considered LPs up to any additive error term, it suffices to provide an FPTAS for each 
of the corresponding dual separation problems. 

• Bin Packing : The dual separation problem is max{£, eS y, | Efes s i - 1} which is known as Knapsack 
problem and admits an FPTAS in time T{n,E) = 0{n/e 2 ) (see e.g. IVa zOll ). 

• Bin Packing With Rejection: We compute a (1 - e) -approximation S* to max{£;es y% I L" =1 H < 
1} in time 0{n/e 2 ) and compare it to the values j- for i = l,...,n and output either S* or some {£}, 
whoever yields the largest value, hence again T{n,e) = 0{n/e 2 ). 

• Train Delivery. For any k e {l,...,n}, let S k be a (1 -e) -approximate solution to max{JJ ;eS y,- | 
Xjes Sf < 1, S c {i | pi < p k }}. We output the set S k maximizing Lies* ft- This can be done in time 
T{n,e) = 0{n 2 /£ 2 ). 



C Omitted proof for Lemma|5] 

Lemma (Lemma[5}. Let aeU m be a vector and A > 0. For A = tjA-, 



H 



T 

a X 
2A 



[ 9e" A 15 if A > 2 

< G(A) := 4 

[log 2 (32 + 64/A) ifX<2 



Proof. We distinguish 2 cases. Case A > 2. Let pi := Pr[Z = i]. Note that X := a 1 % = LjLi a j ' Xj i s me 
sum of independently distributed random variables %j -ocj = +\ aj | with mean 0. For i > 1 , 



Lem.m - - l2 

Pi<Vr[X> (2i-l)A] < 2e n < = 
The entropy, stemming from i > 1 is fairly small, namely 



2e -(2i-l) 2 A 2 /2 A ^ 2 e -(2i-l) 2 A 2 /4 



f 1 



£>;log 2 — <E 



;>1 



VP; 



-(2i-l) 2 A 2 /4 



l°g 2 



1 



^-(2i-l) 2 A 2 /4 



£ e -(2i-l) 2 A 2 /4 . (2Z-D 2 A 2 < 3 _ ^_ A 2 /5 
!>1 



4 -In (2) 
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Here we use that pi < - e and jc-log 2 (^) is monotone increasing for x £ [0, i]. Furthermore po ^ 1 -Pr[|X| 
A] > 1 - 2e" A2/2 , hence the event {Z = 0} is so likely that it also does not contribute much entropy. 



A) log 



' 1 



< 2 • (1 - p ) < 4e" A2/2 < 2e" A2/5 



Adding up also the entropy for i < 0, we obtain H(Z) < 8 • e~ A2/5 . 

Case X < 2. Define L := [|l > 1 and A' := A • L. Then we can express Z = L - [^J + Z" such that Z" 
attains just L different values (and hence H{Z") < log 2 (L)). Let A' := A'/||a|| 2 >2, then 

mZ) ( <' H( [ ^ J ) + H{Z") [ < ] 9e-"' 2 ' 5 + log 2 (L) < 9_^ + log 2 (| + 1) 

<5 

In (*), we use the subadditivity of the entropy function. In (* *) we use that H{\-^\) can be bounded by 



case (1). 



□ 



25 



